# Image caption generation
Gemma 3 4b It Qat 4bit
Other
Gemma 3 4B IT QAT 4bit is a 4-bit quantized large language model trained with Quantization-Aware Training (QAT), based on the Gemma 3 architecture and optimized for the MLX framework.
Image-to-Text
Transformers Other

G
mlx-community
607
1
Florence 2 Base Gpt4 Captioner V1
MIT
A GPT4-O style caption generator fine-tuned based on Florence-2-base-ft for generating image descriptions
Image-to-Text
Transformers Supports Multiple Languages

F
Vimax97
224
0
Llama 3.2 11B Vision Instruct Nf4
4-bit quantized version based on meta-llama/Llama-3.2-11B-Vision-Instruct, supporting image understanding and text generation tasks
Image-to-Text
Transformers

L
SeanScripts
658
12
Tvl Mini 0.1
Apache-2.0
This is a LORA fine-tuned version of the Qwen2-VL-2B model for Russian, supporting multimodal tasks.
Image-to-Text
Transformers Supports Multiple Languages

T
2Vasabi
23
2
Zcabnzh Bp
Bsd-3-clause
BLIP is a unified vision-language pretraining framework, excelling in tasks like image caption generation and visual question answering, with performance enhanced by innovative data filtering mechanisms
Image-to-Text
Transformers

Z
nanxiz
19
0
Florence 2 Large Ft
MIT
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle various vision and vision-language tasks.
Image-to-Text
Transformers

F
andito
93
4
Paligemma Rich Captions
Apache-2.0
An image caption generation model fine-tuned on the DocCI dataset based on PaliGemma-3b, capable of generating detailed descriptions of 200-350 characters with reduced hallucination
Image-to-Text
Transformers English

P
gokaygokay
66
9
Spydazwebai Image Projectors
An image-to-text model based on the Transformers library, capable of converting image content into descriptive text, particularly suitable for the art domain.
Image-to-Text Supports Multiple Languages
S
LeroyDyer
560
1
Uform Gen2 Qwen 500m
Apache-2.0
UForm-Gen is a small generative vision-language model primarily used for image caption generation and visual question answering.
Image-to-Text
Transformers English

U
unum-cloud
17.98k
76
Git Base One Piece
MIT
A vision-language model fine-tuned from Microsoft's git-base model, specifically designed to generate descriptive text captions for images from the anime 'One Piece'
Image-to-Text
Transformers Supports Multiple Languages

G
ayoubkirouane
16
0
Featured Recommended AI Models